Fix parallel rustc not being reproducible due to unstable sorts of items #144722

ywxt · 2025-07-31T08:11:42Z

Currently, A tuple (DefId, SymbolName) is used to determine the order of items in the final binary. However DefId is expected as non-deterministic, which leads to some not reproducible issues under parallel compilation. (See #140425 (comment))

Theoretically, we don't need the sorting because the order of these items is already deterministic.

However, codegen tests reply on the same order of items between in binary and source.

So here we added a new option codegen-source-order to indicate whether sorting based on the order in source. For codegen tests, items are sorted according to the order in the source code, whereas in the normal path, no sorting is performed.

Specially, for codegen tests, in preparation for parallel compilation potentially being enabled by default in the future, we use Span replacing DefId to make the order deterministic.

This PR is purposed to fix #140425, but seemly works on #140413 too.

This behavior hasn't added into any test until we have a test suit for the parallel frontend. (See #143953)

Related discussion: Zulip #144576

Update #113349

r? @oli-obk
cc @lqd @cramertj @matthiaskrgr @Zoxc @SparrowLii @bjorn3 @cjgillot @joshtriplett

rustbot · 2025-07-31T08:11:46Z

oli-obk is not on the review rotation at the moment.
They may take a while to respond.

rustbot · 2025-07-31T08:11:49Z

Some changes occurred in src/tools/compiletest

cc @jieyouxu

ywxt · 2025-07-31T09:37:37Z

I don't think the changes of diagnostic orders are expected.😅

compiler/rustc_middle/src/mir/mono.rs

cjgillot · 2025-08-03T20:29:28Z

compiler/rustc_middle/src/mir/mono.rs

-                    MonoItem::Static(def_id) => def_id.as_local().map(Idx::index),
-                    MonoItem::GlobalAsm(item_id) => Some(item_id.owner_id.def_id.index()),
-                },
+                local_item_query(item, |def_id| tcx.def_span(def_id)),


Should we try to use def_ident_span or find_ancestor_not_from_extern_macro to avoid having to shuffle tests?

Thanks for your advice. I'll try it. :)

~~def_ident_span failed on tests/assembly-llvm/emit-intel-att-syntax.rs about naked and global asm.~~

find_ancestor_not_from_macro works here.

ywxt · 2025-08-04T06:24:51Z

If we don't sort items again, it would broke existing tests.

How about (def_span().find_ancestor_not_form_macro(), SymbolName) to replace (DefId, SymbolName) instead of using a new option?

Edit: querying span makes a significant regression for perf.

SparrowLii · 2025-08-06T06:09:26Z

@bors try @rust-timer queue

rust-bors · 2025-08-06T06:09:30Z

⌛ Trying commit 8a02371 with merge 77b6bc0…

To cancel the try build, run the command @bors try cancel.

Fix parallel rustc not being reproducible due to unstable sorts of items

rust-bors · 2025-08-06T08:23:41Z

☀️ Try build successful (CI)
Build commit: 77b6bc0 (77b6bc0c5f1162a75f7502ef975c859ece90ef4e, parent: ec7c02612527d185c379900b613311bc1dcbf7dc)

rust-timer · 2025-08-06T09:35:15Z

Finished benchmarking commit (77b6bc0): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.1%, 0.5%]	2
Regressions ❌ (secondary)	0.1%	[0.1%, 0.1%]	1
Improvements ✅ (primary)	-0.2%	[-0.3%, -0.2%]	8
Improvements ✅ (secondary)	-0.6%	[-1.6%, -0.0%]	16
All ❌✅ (primary)	-0.1%	[-0.3%, 0.5%]	10

Max RSS (memory usage)

Results (primary -2.4%, secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.7%	[2.7%, 2.7%]	1
Regressions ❌ (secondary)	2.1%	[1.6%, 2.6%]	2
Improvements ✅ (primary)	-3.2%	[-6.2%, -1.5%]	6
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.4%	[-6.2%, 2.7%]	7

Cycles

Results (primary 3.4%, secondary 3.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.4%	[1.7%, 6.8%]	17
Regressions ❌ (secondary)	3.5%	[1.8%, 5.0%]	6
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.4%	[1.7%, 6.8%]	17

Binary size

Results (primary 0.0%, secondary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.1%]	11
Regressions ❌ (secondary)	0.2%	[0.0%, 0.9%]	47
Improvements ✅ (primary)	-0.0%	[-0.0%, -0.0%]	17
Improvements ✅ (secondary)	-0.0%	[-0.1%, -0.0%]	8
All ❌✅ (primary)	0.0%	[-0.0%, 0.1%]	28

Bootstrap: 467.149s -> 473.005s (1.25%)
Artifact size: 377.42 MiB -> 377.35 MiB (-0.02%)

Fix parallel rustc not being reproducible due to unstable sorts of items try-job: apple

rust-bors · 2025-08-13T02:57:08Z

💔 Test for 4507c18 failed: CI. Failed job:

Calculate job matrix (web logs, extended logs)

rust-log-analyzer · 2025-08-13T02:57:11Z

A job failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

   Compiling ureq v3.0.8
   Compiling citool v0.1.0 (/home/runner/work/rust/rust/src/ci/citool)
    Finished `dev` profile [unoptimized] target(s) in 21.94s
     Running `target/debug/citool calculate-job-matrix`
Run type: TryJob { job_patterns: Some(["apple"]) }
Error: Failed to calculate job matrix

Caused by:
    Patterns `apple` did not match any auto jobs
##[error]Process completed with exit code 1.
Post job cleanup.

SparrowLii · 2025-08-13T03:00:43Z

@bors2 try jobs=build-aarch64-apple

Fix parallel rustc not being reproducible due to unstable sorts of items try-job: build-aarch64-apple

rust-bors · 2025-08-13T03:01:35Z

💔 Test for 7339c51 failed: CI. Failed job:

Calculate job matrix (web logs, extended logs)

rust-log-analyzer · 2025-08-13T03:01:37Z

A job failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)

   Compiling ureq v3.0.8
   Compiling citool v0.1.0 (/home/runner/work/rust/rust/src/ci/citool)
    Finished `dev` profile [unoptimized] target(s) in 22.49s
     Running `target/debug/citool calculate-job-matrix`
Run type: TryJob { job_patterns: Some(["build-aarch64-apple"]) }
Error: Failed to calculate job matrix

Caused by:
    Patterns `build-aarch64-apple` did not match any auto jobs
##[error]Process completed with exit code 1.

SparrowLii · 2025-08-13T03:08:51Z

@bors2 try jobs=dist-aarch64-apple

Fix parallel rustc not being reproducible due to unstable sorts of items try-job: dist-aarch64-apple

SparrowLii · 2025-08-13T03:14:34Z

@bors2 try cancel

rust-bors · 2025-08-13T03:14:37Z

Try build cancelled. Cancelled workflows:

https://github.com/rust-lang/rust/actions/runs/16926428141

SparrowLii · 2025-08-13T03:14:53Z

@bors2 try jobs=aarch64-apple

Fix parallel rustc not being reproducible due to unstable sorts of items try-job: aarch64-apple

rust-bors · 2025-08-13T04:57:07Z

☀️ Try build successful (CI)
Build commit: 094c028 (094c0287b7ea1b82434c48bb8511571906b131e7, parent: 1553adfe6884a8f6c28f5a673d3e605535ee0113)

SparrowLii · 2025-08-13T07:26:02Z

@bors r+

bors · 2025-08-13T07:26:04Z

📌 Commit bc8a521 has been approved by SparrowLii

It is now in the queue for this repository.

bors · 2025-08-13T10:39:18Z

⌛ Testing commit bc8a521 with merge 350d0ef...

bors · 2025-08-13T13:47:09Z

☀️ Test successful - checks-actions
Approved by: SparrowLii
Pushing 350d0ef to master...

github-actions · 2025-08-13T13:50:32Z

What is this?

This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 1c9952f (parent) -> 350d0ef (this PR)

Test differences

Show 2 test diffs

2 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 350d0ef0ec0493e6d21cfb265cb8211a0e74d766 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

dist-aarch64-linux: 8503.7s -> 6011.7s (-29.3%)
dist-aarch64-apple: 6363.1s -> 5244.3s (-17.6%)
x86_64-apple-2: 5030.0s -> 5693.1s (13.2%)
i686-gnu-1: 9191.5s -> 8243.1s (-10.3%)
aarch64-apple: 5697.6s -> 6266.3s (10.0%)
pr-check-1: 1846.5s -> 1670.4s (-9.5%)
tidy: 107.4s -> 116.0s (7.9%)
dist-apple-various: 5126.6s -> 5497.2s (7.2%)
x86_64-msvc-2: 6890.8s -> 7307.0s (6.0%)
dist-aarch64-msvc: 5333.7s -> 5652.9s (6.0%)

How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

rust-timer · 2025-08-13T15:18:13Z

Finished benchmarking commit (350d0ef): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

If the regression was expected or you think it can be justified,
please write a comment with sufficient written justification, and add
@rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
If you think that you know of a way to resolve the regression, try to create
a new PR with a fix for the regression.
If you do not understand the regression or you think that it is just noise,
you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.5%]	5
Regressions ❌ (secondary)	0.2%	[0.2%, 0.3%]	7
Improvements ✅ (primary)	-0.2%	[-0.6%, -0.1%]	12
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.1%	[-0.6%, 0.5%]	17

Max RSS (memory usage)

Results (primary -2.0%, secondary 2.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.6%	[2.4%, 2.8%]	2
Improvements ✅ (primary)	-2.0%	[-2.8%, -1.2%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.0%	[-2.8%, -1.2%]	2

Cycles

Results (primary 3.5%, secondary 3.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.5%	[1.4%, 6.6%]	10
Regressions ❌ (secondary)	3.4%	[1.4%, 5.5%]	6
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.5%	[1.4%, 6.6%]	10

Binary size

Results (primary 0.0%, secondary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.1%]	13
Regressions ❌ (secondary)	0.1%	[0.0%, 0.2%]	45
Improvements ✅ (primary)	-0.0%	[-0.0%, -0.0%]	17
Improvements ✅ (secondary)	-0.0%	[-0.0%, -0.0%]	6
All ❌✅ (primary)	0.0%	[-0.0%, 0.1%]	30

Bootstrap: 465.78s -> 468.291s (0.54%)
Artifact size: 377.44 MiB -> 377.36 MiB (-0.02%)

rustbot assigned oli-obk Jul 31, 2025

ywxt mentioned this pull request Jul 31, 2025

Fix parallel rustc not being reproducible due to unstable sorts of items #144576

Closed